Numpy parquet faster #21

rom1504 · 2022-04-19T22:56:09Z

faster but didn't solve memleak

computing safety predictions on top of clip embeddings

rom1504 · 2022-04-20T07:42:50Z

Did solve memleak but this current mutex + dict implementation is complex and not very reliable
Instead handle this in the loader function of the iteration : prepare file-> table mapping in advance
And clean up when we don't need them using reference counting

rom1504 · 2022-04-20T07:49:22Z

That or remove the parallism completely from the parquet reading and instead do a simpler loader using the precomputed pieces to know what slices to read
Worth benchmarking a few solutions for the parquet alone

Veldrovive · 2022-04-27T18:22:54Z

It might also be valuable to take into account down the line parallelization such as torch dataset workers. Maybe sequential parquet reading would be fine in that case.

rom1504 · 2022-05-15T23:55:43Z

done in another pr

rom1504 added 3 commits March 24, 2022 00:13

inference example

40eb333

computing safety predictions on top of clip embeddings

better inference

915c612

faster numpy parquet

6983466

better parquet numpy reader

913f259

rom1504 closed this May 15, 2022

rom1504 deleted the numpy_parquet_faster branch May 15, 2022 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numpy parquet faster #21

Numpy parquet faster #21

rom1504 commented Apr 19, 2022

rom1504 commented Apr 20, 2022

rom1504 commented Apr 20, 2022

Veldrovive commented Apr 27, 2022

rom1504 commented May 15, 2022

Numpy parquet faster #21

Numpy parquet faster #21

Conversation

rom1504 commented Apr 19, 2022

rom1504 commented Apr 20, 2022

rom1504 commented Apr 20, 2022

Veldrovive commented Apr 27, 2022

rom1504 commented May 15, 2022